A Corpus of Natural Multimodal Spatial Scene Descriptions
نویسندگان
چکیده
We present a corpus of multimodal spatial descriptions, as commonly occurring in route giving tasks. Participants provided natural spatial scene descriptions with speech and abstract deictic/iconic hand gestures. The scenes were composed of simple geometric objects. While the language denotes object shape and visual properties (e.g., colour), the abstract deictic gestures “placed” objects in gesture space to denote spatial relations of objects. Only together with speech do these gestures receive defined meanings. Hence, the presented corpus goes beyond previous work on gestures in multimodal interfaces that either focusses on gestures with predefined meanings (multimodal commands) or provides hand motion data without accompanying speech. At the same time, the setting is more constrained than full human/human interaction, making the resulting data more amenable to computational analysis and more directly useable for learning natural computer interfaces. Our preliminary analysis results show that co-verbal deictic gestures in the corpus reflect spatial configurations of objects, and there are variations of gesture space and verbal descriptions. The provided verbal descriptions and hand motion data will enable modelling the interpretations of natural multimodal descriptions with machine learning methods, as well as other tasks such as generating natural multimodal spatial descriptions.
منابع مشابه
Building and Applying Perceptually-Grounded Representations of Multimodal Scene Descriptions
“You see a red building, and then behind that [gesture] you turn left”. Hearing this kind of route description, only to apply its instructions at a later time, is a difficult task. The content of the description has to be memorised, and then, when the time comes to make use of it, be applied to the present situation. This makes for a good test case for a model of situated dialogue understanding...
متن کاملTowards Metadata Descriptions for Multimodal Corpora of Natural Communication Data
Metadata play an important role for successful corpus management and reusability of corpora. For linguistic resources there already exist a large amount of metadata descriptions and metadata schemes. However, not much work has been done to develop metadata for the particular structure of multimodal corpora, yet. In this paper we provide a review of existing metadata profiles for multimodal data...
متن کاملA Scene Corpus for Training and Testing Spatial Communication Systems
It is argued that a ‘scene corpus’ would be a useful tool for the training and testing of systems for grounded spatial communication in the same way that text corpora have been used for training and assessing other language processing systems. Such a scene corpus would need to allow a full range of spatial relationships to be expressed over a range of scale spaces. The scenes should be sufficie...
متن کاملNatural Spatial Language Generation for Indoor Robot
This paper proposes a spatial language generation system to find short, accurate and human-like descriptions for robots to communicate with a human user about the location of an object. The research focuses on building static spatial descriptions which use reference objects and directions to describe spatial relations. The system generates a natural spatial description in three steps. In the fi...
متن کاملLearning to Interpret and Describe Abstract Scenes
Given a (static) scene, a human can effortlessly describe what is going on (who is doing what to whom, how, and why). The process requires knowledge about the world, how it is perceived, and described. In this paper we study the problem of interpreting and verbalizing visual information using abstract scenes created from collections of clip art images. We propose a model inspired by machine tra...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2018